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Summary 

Background Severe acute respiratory syndrome (SARS) is a 
newly emerged disease caused by a novel coronavirus 
(SARS-CoV), which spread globally in early 2003, affecting 
over 30 countries. We have used molecular epidemiology to 
define the patterns of spread of the virus in Hong Kong and 
beyond. 

Methods The case definition of SARS was based on that 
recommended by WHO. We genetically sequenced the gene 
for the SI unit of the viral spike protein of viruses from 
patients with SARS in Hong Kong (138) and Guangdong 
(three) in February to April, 2003. We undertook phylogenetic 
comparisons with 27 other sequences available from public 
databases (Genbank). 

Findings Most of the Hong Kong viruses (139/142), 
including those from a large outbreak in an apartment block, 
clustered closely together with the isolate from a single index 
case (HKU-33) who came from Guangdong to Hong Kong in 
late February. Three other isolates were genetically distinct 
from HKU-33 in Hong Kong during February, but none of 
these contributed substantially to the subsequent local 
outbreak. Viruses identified in Guangdong and Beijing were 
genetically more diverse. 

Interpretation The molecular epidemiological evidence 
suggests that most SARS-CoV from the outbreak in Hong 
Kong, as well as the viruses from Canada, Vietnam, and 
Singapore, are genetically closely linked. Three viruses found 
in Hong Kong in February were phylogenetically distinct from 
the major cluster, which suggests that several introductions 
of the virus had occurred, but that only one was associated 
with the subsequent outbreak in Hong Kong, which in turn 
spread globally. 

Lancet 2004; 363: 99-104 


Departments of Microbiology (Y Guan PhD, J S M Peiris DPhii, 

B Zheng PhD, L L M Poon dpmi, K H Chan PhD, S W Leung bsc, 

K Y Yuen md) and Zoology (F Y Zeng PhD, C W M Chan bsc, 

M N Chan mpmi, J D Chen PhD, K Y C Chow bsc, C C Hon bsc, 

K H Hui MPhii, J Li PhD, V Y Y Li bsc, Y Wang bsc, F C Leung PhD), 
University of Hong Kong, Hong Kong SAR, China 

Correspondence to: Dr F C Leung, Department of Zoology, University 
of Hong Kong, Pokfulam Road, Hong Kong SAR 
(e-mail: fcleung@hkucc.hku.hk) 


Introduction 

Severe acute respiratory syndrome (SARS) was first 
reported in the Chinese province of Guangdong in 
November, 2002, and caused an outbreak there in 
January to April, 2003. 1 A novel coronavirus (SARS-CoV) 
has been identified as the causative agent of this emerging 
disease. 2,3 In Hong Kong Special Administrative Region, 
the index case of the SARS outbreak can be dated back to 
February, 2003. Since then the disease has become 
pandemic, with (at the time of writing) 8098 patients and 
744 deaths in over 30 countries. 4,5 The complete genomes 
of four SARS-CoV isolates have been sequenced 
independently. 6 Comparison with sequences of other 
human and animal coronaviruses has shown that SARS- 
CoV is distinct from them and warrants classification as a 
new group within the Coronaviridae. 7 9 

SARS-CoV is an enveloped, positive-sense, single- 
stranded-RNA virus. The genome is about 27 000 bp, 
encoding the replicase and spike, envelope, membrane, 
and nucleocapsid proteins. 7 " 9 In most animal and human 
coronaviruses, the gene for the spike protein is divided 
into amino (SI) and carboxyl (S2) regions; the SI subunit 
is associated with receptor-binding functions, and the S2 
subunit is a conserved transmembrane protein mediating 
fusion of viral and cellular membranes. 10 The spike 
protein is also important in viral entry, pathogenesis, 10 
antiviral immune response, 11 virulence, 12 and cellular 13 or 
even species tropism. 14 In general, the spike protein is the 
most variable region of the genome and has generally been 
used for genotyping of coronaviruses, as shown by studies 
on human coronaviruses 15 and on avian infectious 
bronchitis virus. 16 Therefore, phylogenetic analysis of 
SARS-CoV is primarily based on the SI region of the 
spike protein sequence. 

We have used molecular biological techniques to 
investigate the phylogenetic relations among SARS-CoV 
isolates from Hong Kong and from three patients from 
Guangdong during the period February to April, 2003. By 
analysis of a 2149 bp fragment of the SI gene of these 
viruses and data available in Genbank, the molecular 
epidemiological relations of these SARS-CoVs could be 
reconstructed. 

Methods 

Patients 

The case definition of SARS we used was modified from 
that of WHO as previously described. 3 Samples (n=138) 
from 137 patients with SARS admitted to hospitals in 
Hong Kong Special Administrative Region were analysed: 
three from February, 64 from March, and 71 from April, 
2003. We also studied three virus isolates obtained from 
three patients in Guangdong Chest Hospital during 
February. All these samples were collected as part of 
routine clinical management. The epidemiological 
relations and clinical features of the key patients 
associated with the Hong Kong and Guangdong isolates 
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Virus 

strain 

Date of 
collection* 

Clinical 

sample 

Key epidemiological information 

GZ-43 

Feb 18 

NPA 

Patient in Guangdong Chest Hospital 

GZ-50 

Feb 18 

NPA 

Patient in Guangdong Chest Hospital 

GZ-60 

Feb 18 

NPA 

Patient in Guangdong Chest Hospital 

HKU-33 

Feb 24 

NPA 

Patient 1; index case in Hong Kong 
outbreak; onset of disease Feb 15; 
arrived in Hong Kong Feb 21; stayed in 
hotel M; admitted to hospital Feb 22 

HKU-33867 Feb 24 

NPA 

Recent travel to Guangdong visiting 
his father who died of pneumonia 

Feb 22 

HKU-36 

Feb 28 

NPA 

Pneumonia 2 days after return from 
travel to Guangdong 

HKU-39849 March 4 

Lung 

biopsy 

Patient 2A; social contact of patient 1; 
onset of disease Feb 24 

HKU-65 

March 9 

Throat 

swab 

Amoy Gardens outbreak; onset of 
disease March 24 

HKU-66 

March 9 

Stool 

Amoy Gardens outbreak; onset of 
disease March 24 

HKU-55 

March 27 

Stool 

Patient 3A: health-care worker 

HKU-56 

March 28 

Stool 

Patient 3B: health-care worker 


NPA=nasopharyngeal aspirate. *AII dates are in 2003. 

Table 1: Origins and epidemiological data of the key virus 
isolates used in the study 


have been described previously. 1,4,17 Those relevant to this 
study are summarised in table 1. These include the 
presumed index case patient 1 (HKU-33) and patients 
directly related to him (figure 1). 

Patient X, the index case of the large outbreak of 
over 300 cases in an apartment block, Amoy Gardens, 
was probably indirectly linked epidemiologically to 
patient 1. 18-20 To investigate whether apparent differences 
in transmission and clinical presentation of disease 
acquired in Amoy Gardens are related to viral genetic 
differences, the whole genomes of viral isolates from 
two of these patients (HKU-65 and HKU-66) were 
sequenced. The full genomes of four viruses from patients 
with no epidemiological link to patient 1 in Hong Kong 
(HKU-36) and Guangdong (GZ-43, GZ-50, GZ-60) 
were also studied. 

Procedures 

Total RNA for investigation of viral genetic sequences 
of GZ-43, GZ-50, GZ-60, HKU-36, HKU-39849, 
HKU-65, and HKU-66 was obtained from virus isolates, 


Name 

Sequence (starting from 5 ) 

Forward primer 

Set 1 

PF138 

GGAACT GCT GTAAT GT CT CT 

NF72 

GGTAGGCTTATCATTAGAGA 

SF30 

CAAC AG AGTT GT GGTTT C A 

Set 2 

PF615 

GTTCGT GAT CT ACCTT CTGG 

NF708 

GCCTTTT CACCT GCT CAA 

SF1214 

CCAGGACAAACTGGT GTTAT 

Set 3 

PF1194 

CCAGGACAAACTGGT GTTAT 

NF1321 

GGTATCTTAGACATGGCAAGC 

SF1610 

GGACT CACTGGTACTGGT GT 

Reverse primer 

Set 1 

PR946 

T CT CACAACATCTCCTGA 

NR868 

CAGACG ATTT GAGTT CAG 

SF450 

CAT GG GT ACACAGACACATACT 

Set 2 

PR1729 

C G C AAG GT G AAAT GTCTAA 

NR1476 

TTGGTAGCCAATGCCAGT AGT 

SR1322 

GCTT GCCAT GT CT AAGAT AC 

Set 3 

PR2440 

CAGCAT CAGCGAGT GT CAC 

NR2300 

CTT G AGCG AACACTT C AC 

SR1710 

C G C AAG GT G AAAT GTCTAA 


Table 2: Primer sequences used for nested PCR and 
sequencing 


whereas all other sequences derived in this study 
were obtained by direct RT-PCR amplification of the 
SI gene fragment from the clinical sample. The cDNA 
was generated by reverse transcription with random 
primers. The first 2149 bp of the spike protein gene 
were amplified from the viral cDNA by nested PCR 
and characterised by sequencing of the amplification 
product. RT-PCR and sequencing procedures were 
done as previously described. 9 Three sets of primers 
were used for nested PCR and sequencing reactions. 
These primers were numbered according to their 
nucleotide position relative to the start codon in the 
open reading frame of the spike protein gene (table 2). 
Viral RNA extraction, PCR amplification, and the 
analysis of PCR-amplified products were carried out in 
different areas of the laboratory. Measures to avoid 
cross-contamination by PCR-amplified DNA were 
adopted. 
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Figure 1: Epidemiological data on selected key virus isolates 

Schematic representation of the relations between the key patients and the outbreaks in different regions of the world. Dotted arrows indicate uncertain 
transmission route, and grey boxes indicate regions of outbreaks. Designations of the viruses are shown in parentheses. The clinical and epidemiological 
relations of some of these patients have been described previously. 17 In reference 17 these patients were designated as: patient Impatient 1; patient 
2A=patient 2; patient 3A=patient 6; patient 3B=patient 7. 
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Figure 2: Phylogenetic analysis of 169 SARS-CoV isolates 

Numbers at nodes indicate bootstrap values (%). Branch length shows the genetic distance with reference to the horizontal scale bar. For convenience of 
display of a large number of isolates, field isolates with zero genetic distance compared with their subcluster (Bl, B2-1, and B2-2) common ancestors are 
replaced by the subcluster name with the number of isolates shown in parentheses (highlighted in blue). The full phylogenetic tree of the 169 isolates is 
available at http://image.thelancet.com/extras/03art5344webfigurel.pdf. Viral gene sequences highlighted in red represent key isolates described in 
figure 1. Sequences previously available in Genbank are shown in bold and italic. Subcluster names are shown on the right of the tree. Total numbers of 
isolates included in the cluster are shown in brackets. Cluster transition isolates are indicated by blue dotted arrows. Epidemiological details of key viruses 
are provided in table 1 and figure 1. Other viruses sequenced as part of this study are prefixed with F, M, or A, to represent the month of sample collection 
(February, March, or April). Sequence alignment of the recurrent critical point mutations that distinguish the groups and subgroups are shown on the right 
of the tree. All these mutations are non-silent except at 1026 and 1068 (highlighted in green). 
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The phylogenetic relations of these viruses, including 
the 27 sequences downloaded from Genbank (http:// 
image.thelancet.com/extras/03art5344webtable.pdf), were 
reconstructed on the basis of the nucleotide sequence of 
the S1 region. Optimum alignments of the viral nucleotide 
sequences were made with MegAlign (version 4-03). 21 
The phylogenetic trees based on the optimum alignment 
were constructed by the neighbour-joining method 22 with 
MEGA (version 2.1). 23 Robustness of the trees was 
assessed by bootstrap analysis on 1000 replicates of the 
dataset. Critical point mutations associated with different 
lineages are shown (figure 2). 

The complete genome sequences of six selected virus 
isolates were obtained (primers designed on the basis of 
the genetic sequence of HKU-39849, data not shown). 9 
The sequences were aligned and compared with the ten 
other full virus genomes available in Genbank (figure 3; 
webtable). 

Role of the funding source 

The sponsor of the study had no role in study design; 
collection, analysis, or interpretation of data; or in the 
writing of the report. 

Results 

We found that the viruses divided into two distinct 
clusters with five subclusters, designated Al, A2, Bl, 
B2-1, and B2-2 (figure 2). Cluster A included ten isolates 
mainly derived from Beijing (BJ) and Guangdong (GZ), 
which were characterised by mutations at nucleotide 
positions 230 and 731, resulting in aminoacid changes. 
Only three viruses in cluster A were from patients 
from Hong Kong (HKU-33867, CUHK-W1, and 
HKU-36; figure 2; http://image.thelancet.com/extras/ 
03art5344webfigurel.pdf). Since these three patients 
were recently returned from Guangdong (table 1) and 
the incubation period of the disease ranges from 1 day to 
14 days, all three probably acquired their infections in 


February directly from Guangdong, rather than from 
Hong Kong. 1,18,24 

The other viruses from Hong Kong grouped together 
in cluster B, together with the virus sequences derived 
from the presumed index case for the outbreak in 
Hong Kong (HKU-33a/b), as well as viruses from 
Canada (Tor2), Singapore (eg, SIN2500), Germany 
(Frankfurt 1), Italy (HSR1), Singapore, Taiwan, and 
Vietnam (Urbani). An isolate from Zhejiang province of 
China (ZJ01) was also in this group. 

The initial nasopharyngeal aspirate collected from 
patient 1 (HKU-33) 7 days after onset of his illness 
had two distinct variants designated HKU-33 a and 
HKU-33b, falling within subclusters B2-1 and Bl, 
respectively, with a silent sequence variation at nucleotide 
position 1068 of the SI region (C/T). Subsequent clinical 
samples from this patient contained one or other variant. 
HKU-33 a was detected in a nasopharyngeal aspirate on 
day 9 and after death in liver and lung samples on day 25; 
HKU-33b was also detected in an endotracheal aspirate 
and bronchoalveolar lavage sample on day 10. 

The viral sequences derived from cases directly linked 
to patient 1 (figure 1) were further subdivided into two 
subclusters Bl and B2 (figure 2). Viruses including HKU- 
39489, HKU-55, HKU-56, Urbani, and CUHK-SulO 18 
were all grouped in cluster Bl in the same sublineage as 
HKU-33b. Tor2 was in cluster B2 in the same lineage as 
HKU-33a. 

Four viruses, BJ04, BJ02, M-55696, and A-67428, 
carried the characteristic residues of two subcluster 
prototypes and could not be efficiently classed in any of 
the subclusters; they were defined as subcluster transition 
viruses (figure 2). 

In total, 56 single nucleotide variations were identified 
within the SI region of the 169 viruses analysed (from 
168 patients, including 33a and b). 15 were silent 
(http://image.thelancet.com/extras/03art5344webfigure2 .pdf). 
41 of these sequence variations occurred in only one 
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Figure 3: Nucleotide and aminoacid substitutions of SARS-CoV isolated in Amoy Gardens 

Comparisons were made with other virus isolates by full genome sequence alignment. Except for the Hong Kong and Guangdong samples, the index case 
or first uploaded sequence from other cities and countries was chosen. Nucleotides were numbered on the basis of the HKU-39849 full sequence. 9 Slash 
indicates that the sequence data are not available, and nucleotides highlighted in yellow are those that differ from the majority at the corresponding 
location. Only substitutions occurring in more than one virus are shown. 
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isolate. A 3 bp deletion was identified at position 215-217 
in the SI gene in viruses GZ-43, GZ-60, and M-61576. 
GZ-43 and GZ-60 were isolated from nasopharyngeal 
aspirates of two health-care workers in Guangdong 
Chest Hospital on Feb 18, 2003, and are probably 
epidemiologically related. The M-61576 sequence was 
derived by direct RT-PCR and sequencing from the urine 
sample of a patient from the Amoy Gardens outbreak in 
April, 2003, which is geographically and epidemiologically 
unrelated to the two Guangdong viruses. This 3 bp 
deletion leads to deletion of an aminoacid residue and a 
change of the residue next to it. 

Isolates HKU-65 and HKU-66, derived from the throat 
and stool samples, respectively, of two patients from the 
Amoy Gardens outbreak, were completely genetically 
sequenced (figure 3). With HKU-39849 as the reference 
strain, there were seven sequence variations. Four were 
recurrent variations that were found in more than one 
virus (figure 3), and the other three were non-recurrent 
(data not shown) and located at positions 20485 
(ORFlab) and 27384 (X2) in the genome of HKU-65 
and at position 16325 (ORFlab) in HKU-66. Another 
characteristic sequence variation was located in the 
3' untranslated region (nucleotide 29725) of both 
genomes (G to C). This mutation occurred in only three 
Guangdong-related virus isolates, GZ-43, GZ-50, and 
HKU-36. However, these three viruses did not cluster 
together in the phylogenetic analysis of the SI gene 
region, and the patients do not share a known 
epidemiological linkage (table 1). 

Discussion 

The molecular epidemiological analysis suggests that 
there were introductions of several virus strains into 
Hong Kong during February, 2003—HKU-39849, 
HKU-33867, HKU-36, and CUHK-W1. However, 
it also supports the conclusion derived from the 
conventional epidemiology 4 that the SARS outbreak in 
Hong Kong during March and April was largely, if not 
completely, derived from a common source. 

Although only one of the four patients diagnosed with 
SARS in Hong Kong in February had viruses within 
cluster A, all 65 patients diagnosed in March (including 
HKU-39849 and CUHK-SulO viruses previously 
sequenced) and 71 patients diagnosed in April had 
viruses of cluster B, together with the viruses (HKU- 
33a/b) derived from the presumed index case patient 1. 
Although molecular epidemiology cannot pinpoint 
the initiator of this outbreak with precision, the first 
virus from cluster B identified in Hong Kong was 
that isolated from patient 1 (HKU-33). Thus, taken 
together with the epidemiological data, the molecular 
epidemiology is compatible with the suggestion 4 that 
patient 1 was the index case of the outbreak in Hong 
Kong. The viruses in Canada, Singapore, Taiwan, and 
Vietnam also belong to cluster B, so the evidence 
supports the contention that these outbreaks derived 
from patient 1. 

Our findings also reveal greater genetic diversity 
among strains in mainland China. Viruses from 
Guangdong and Beijing viruses mainly grouped in 
cluster A, whereas a Zhejiang (ZJ01) isolate grouped in 
cluster B. Further analysis of strains from mainland 
China is needed to consolidate the overall patterns of 
viral spreading but the greater genetic diversity of viruses 
in mainland China implies that SARS-CoV has been 
circulating there for a while. We propose that a single 
virus strain (HKU-33a/b) from this diverse pool infected 
patient 1, who then travelled to Hong Kong and initiated 


the outbreak there, which then rapidly spread to other 
countries around the world. 

The three patients diagnosed during February with 
viruses in cluster A (HKU-33867, CUHK-W1, and 
HKU-36) all had epidemiological evidence that suggested 
infection had been acquired directly from Guangdong, 
and they represent evidence for several independent 
entries of SARS-CoV into Hong Kong from Guangdong 
during February. Why these viruses did not spread in the 
Hong Kong population when HKU-33a/b did so is 
unclear. Epidemiological data suggest that a few index 
cases caused a disproportionate number of secondary 
cases, the so-called “super-spreading incidents”. 24 The 
initiation by HKU-33a/b in patient 1 of the first super¬ 
spreading incident in Hong Kong may have been simply a 
matter of chance. Once this had occurred, second super¬ 
spreading events (as happened in Prince of Wales 
Hospital and in the Amoy Gardens apartment block) 
were stochastically more likely to follow from this cluster. 
An alternative hypothesis is that the viruses of the 
HKU-33a/b lineage (cluster B) are biologically more 
predisposed to initiating super-spreading events. This 
hypothesis needs more detailed investigation. Whether the 
presence of more than one genetic variant in patient 1 is 
relevant to the rapid spreading of this virus strain also 
requires further investigation. 

There was an indication that the epidemiology and the 
clinical range of illness associated with the large outbreak 
in the high-rise apartment block in Amoy Gardens 
differed from that reported previously. 19,20 Sequence 
variations in the SI gene are associated with significant 
changes in tissue tropisms of animal coronaviruses. 25,26 
Therefore, investigation of whether the isolates from 
Amoy Gardens patients had significant variation in the 
gene for the spike protein was crucial, in case it could 
explain the presumed difference in disease phenotype. 
However, sequence comparison of the two Amoy Gardens 
isolates did not reveal any significant variations within the 
SI gene (figure 2) nor across the rest of the genome 
(figure 3). This virus is therefore unlikely to have arisen 
from a different source from that of the major Hong Kong 
outbreak. The explosive transmission of the virus during 
the Amoy Gardens outbreak may be explained by an 
unusual route of transmission. 20 Aerosolisation of 
contaminated sewage 20 and a role for animal vectors such 
as rodents 27 have been proposed. 

The use of molecular epidemiology to complement 
conventional epidemiology provides additional 
understanding of the transmission of the disease and will 
have a major effect on controlling the spread of disease. 
The findings of this study lay the foundation for 
understanding of the evolution and the genetic diversity of 
SARS-CoV. 
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